Accuracy bounds for ensembles under 0 – 1 loss
نویسنده
چکیده
This paper is an attempt to increase the understanding in the behavior of ensembles for discrete variables in a quantitative way. A set of tight upper and lower bounds for the accuracy of an ensemble is presented for wide classes of ensemble algorithms, including bagging and boosting. The ensemble accuracy is expressed in terms of the accuracies of the members of the ensemble. Since those bounds represent best and worst case behavior only, we study typical behavior as well, and discuss its properties. A parameterized bound is presented which describes ensemble behavior as a mixture of depentent base classifier and independent base classifier areas. Some empirical results are presented to support our conclusions.
منابع مشابه
On Achievable Rates and Complexity of LDPC Codes for Parallel Channels with Application to Puncturing
This paper considers the achievable rates and decoding complexity of low-density parity-check (LDPC) codes over statistically independent parallel channels. The paper starts with the derivation of bounds on the conditional entropy of the transmitted codeword given the received sequence at the output of the parallel channels; the component channels are considered to be memoryless, binary-input, ...
متن کاملOn Universal Properties of Capacity-Approaching LDPC Ensembles
This paper provides some universal information-theoretic bounds related to capacity-approaching ensembles of low-density parity-check (LDPC) codes. These bounds refer to the behavior of the degree distributions of such ensembles, and also to the graphical complexity and the fundamental system of cycles associated with the Tanner graphs of LDPC ensembles. The transmission of these ensembles is a...
متن کاملOn Universal Properties of the Degree Distributions and Cycles of Capacity-Approaching LDPC Ensembles∗
This paper provides some universal information-theoretic bounds related to the degree distributions and the average cardinality of the fundamental system of cycles of low-density parity-check (LDPC) ensembles. The transmission of these ensembles is assumed to take place over an arbitrary memoryless binary-input output-symmetric (MBIOS) channel, and the bounds are expressed in terms of the gap b...
متن کاملBounds for Validation
In this paper we derive the bounds for Validation (known also as Hold-Out Estimate and Train-and-Test Method). We present the best possible bound in the case of 0-1 valued loss function. We also provide the tables where the least sample size is calculated that is necessary for obtaining the bound for a given estimation rate and reliability of estimation. For an arbitrary bounded loss function w...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کامل